In this document, we use brms to explore different ways of modeling user betting decisions. Users select a bet on each trial from 1 to 1000 coins where the expected payoff is inversely proportional to the probability of winning the bet. The optimal bet should be a linear function of the ground truth probability of victory which is encoded in the uncertainty visualization shown on each trail. We want to know how biased and how noisy this betting behavior is under different visualization conditions.

Prepare Data

We load worker responses from our pilot

# read in data 
responses_df <- read_csv("pilot-anonymous.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   workerId = col_character(),
##   condition = col_character(),
##   batch = col_integer(),
##   counterbalance = col_integer(),
##   numeracy = col_integer(),
##   bet = col_integer(),
##   outcome = col_character(),
##   sdDiff = col_integer(),
##   trial = col_integer(),
##   trialIdx = col_integer(),
##   repaired = col_character()
## )
## See spec(...) for full column specifications.
# rename to convert away from camel case
responses_df <- responses_df %>%
  rename(
    ground_truth=groundTruth,
    sd_diff=sdDiff,
    worker_id=workerId,
    start_time=startTime,
    resp_time=respTime,
    trial_dur=trialDur
  ) %>%
  mutate(
    trial_dur = ifelse(trial_dur < 0, 0, trial_dur), # avoid negative trial durations from faulty reconstruction (only one case)
    cles = ifelse(cles == 0, 0.25, cles),            # avoid responses equal to zero
    cles = ifelse(cles == 100, 99.75, cles),         # avoid responses equal to one-hundred
    bet = ifelse(bet == 1000, 999.75, bet)           # avoid responses equal to one-thousand
  ) 

head(responses_df)
## # A tibble: 6 x 22
##   worker_id condition batch counterbalance totalBonus duration numeracy
##   <chr>     <chr>     <int>          <int>      <dbl>    <dbl>    <int>
## 1 7553db84  HOPs          5              5      0.858     687.        6
## 2 7553db84  HOPs          5              5      0.858     687.        6
## 3 7553db84  HOPs          5              5      0.858     687.        6
## 4 7553db84  HOPs          5              5      0.858     687.        6
## 5 7553db84  HOPs          5              5      0.858     687.        6
## 6 7553db84  HOPs          5              5      0.858     687.        6
## # ... with 15 more variables: bet <dbl>, bonus <dbl>, cles <dbl>,
## #   ground_truth <dbl>, keep <dbl>, outcome <chr>, pay <dbl>,
## #   resp_time <dbl>, sd_diff <int>, start_time <dbl>, trial <int>,
## #   trial_dur <dbl>, trialIdx <int>, win <dbl>, repaired <chr>

We also load the data that was used to generate the stimuli that users saw in the pilot.

# data used to create stimuli
load("./conds_df.Rda")

Now, we’ll also process this data to prepare it for modeling.

# calcate the difference in draws for the heuristic functions
draw_differences <- conds_df %>% select(condition, Team, draws) %>% 
  spread(Team, draws) %>% 
  unnest() %>% 
  mutate(
    draws_diff=B - A, 
    A=NULL, 
    B=NULL
  ) %>% 
  group_by(condition) %>% 
  summarise(draws_diff = list(draws_diff[1:50]))

# reformat data conditions df
stimuli_data_df <- conds_df %>% 
  filter(Team %in% "A") %>% # drop duplicate rows for two teams
  left_join(draw_differences, by='condition') %>%
  mutate( # drop unnecessary columns
    condition=NULL,
    Team=NULL, 
    draws=NULL,
    draw_n=NULL,
    quantiles=NULL,
    sample_n=NULL
  )

# repeat heuristics data frame for each worker 
stimuli_data_df <- stimuli_data_df[rep(seq_len(nrow(stimuli_data_df)), times=length(unique(responses_df$worker_id))),]
stimuli_data_df$worker_id <- sort(rep(unique(responses_df$worker_id), each=(length(unique(responses_df$ground_truth))) * length(unique(responses_df$sd_diff))))

# calculate the baseline of relative mean difference heuristic)
stimuli_data_df$max_abs_mean_diff <- max(abs(stimuli_data_df$mean_diff))

We need the data in a format where it is prepared for modeling. We need to get the stimuli-generating data and the worker response data in a single dataframe with one row per worker * trial and convert bet and ground truth units to log odds.

# create data frame for model by merging stimuli-generating data with responses
model_df_llo <- stimuli_data_df %>%
  mutate( # create rounded version of ground_truth to merge on, leaving unrounded value stored in odds_of_victory
    ground_truth=round(odds_of_victory, 3)
  ) %>%
  full_join(responses_df, by=c("worker_id", "sd_diff", "ground_truth")) %>%
  rename( # rename ground_truth columns, so it is clear which is rounded and which should be used in the model
    ground_truth_rounded=ground_truth,
    ground_truth=odds_of_victory
  ) %>% 
  mutate( # apply logit function to cles judgments and ground truth
    lo_bet=qlogis(bet / 1000),
    lo_ground_truth=qlogis(ground_truth),
    sd_diff=as.factor(sd_diff)
  )

Fit the LLO vs Constant Response Mixure to Betting Data

Recall from our exploratory visualization analysis that in the betting data we see a set of different betting strategies. Some workers bet the minimum, maximum, or middle amount every trial. This behavior is captured by a constant response strategy. Very few workers seem to employ the optimal betting strategy where the bet amount is a linear function of the probability of winning. Since the perception of probability is biased (i.e., an inverted sigmoid pi function), responses in this strategy should look like CLES judgments but maybe noisier. Others seem to have some kind of probability threshold where they bet the minimum amount below that threshold and the maximum amount above that threshold (i.e., a sigmoid psychometric function). Both the optimal betting strategy and the threshold strategy can be parameterized within the LLO frame work because the shape of the LLO model is sigmoidal when slope > 1 and an inverse sigmoid where slope < 1. Thus, the patterns we observer in betting behavior should be accounted for by a mixture between LLO and constant response submodels.

Thankfully, we’ve already build such a model for our CLES responses, and we can try to apply it to our betting data by treating bets as a probability judgment (which they implicitly are in this task).

# define stanvars for multi_normal prior on condition effects
stanvars <- stanvar(rep(1, 3), "mu_theta2", scode = "  vector[3] mu_theta2;") +
  stanvar(diag(3), "Sigma_theta2", scode = "  matrix[3, 3] Sigma_theta2;")
# fit the model
m.bet.llo_mix <- brm(
   bf(lo_bet ~ 1, 
    mu1 ~ (1 + lo_ground_truth|worker_id) + lo_ground_truth:condition, # llo model
    mu2 ~ (1|worker_id), # random constant response per worker (to account for people who always answer the same, often but not always 50%)
    theta2 ~ (1|worker_id) + 0 + condition # the proportion of responses that are constant in each condition
  ),
  data = model_df_llo,
  family = mixture(gaussian, gaussian, order = 'mu'),
  prior = c(
    prior(normal(0, 1), class = Intercept, dpar = mu1),
    prior(normal(0, 1), class = Intercept, dpar = mu2),
    prior("multi_normal(mu_theta2, Sigma_theta2)", class = b, dpar = theta2)
  ),
  stanvars = stanvars,
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_llo_mix"
)

Check diagnostics:

# trace plots
plot(m.bet.llo_mix)

# pairs plot (too many things to view at once, so we've grouped them)
# mixture proportions
pairs(m.bet.llo_mix, pars = c("sd_worker_id__theta2_Intercept",
                              "b_theta2_conditionHOPs",
                              "b_theta2_conditionmeans_only",
                              "b_theta2_conditionintervals_w_means"))

# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.llo_mix, pars = c("b_mu1_Intercept",
                               "sd_worker_id__mu1_Intercept",
                               "sd_worker_id__mu1_lo_ground_truth",
                               "cor_worker_id__mu1_Intercept__mu1_lo_ground_truth",
                               "sigma1",
                               "b_mu2_Intercept",
                               "sd_worker_id__mu2_Intercept",
                               "sigma2"
                              ))

# pairs plot (too many things to view at once, so we've grouped them)
# slope effects
pairs(m.bet.llo_mix, pars = c("b_mu1_lo_ground_truth:conditionHOPs",
                               "b_mu1_lo_ground_truth:conditionmeans_only",
                               "b_mu1_lo_ground_truth:conditionintervals_w_means"))

# model summary
print(m.bet.llo_mix)
## Warning: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
## We recommend running more iterations and/or setting stronger priors.
##  Family: mixture(gaussian, gaussian) 
##   Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity 
## Formula: lo_bet ~ 1 
##          mu1 ~ (1 + lo_ground_truth | worker_id) + lo_ground_truth:condition
##          mu2 ~ (1 | worker_id)
##          theta2 ~ (1 | worker_id) + 0 + condition
##    Data: model_df_llo (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                        Estimate Est.Error l-95% CI
## sd(mu1_Intercept)                          2.08      0.15     1.93
## sd(mu1_lo_ground_truth)                    0.52      0.41     0.11
## sd(mu2_Intercept)                          0.00      0.00     0.00
## sd(theta2_Intercept)                       0.16      0.11     0.05
## cor(mu1_Intercept,mu1_lo_ground_truth)     0.62      0.02     0.59
##                                        u-95% CI Eff.Sample    Rhat
## sd(mu1_Intercept)                          2.23          1  473.60
## sd(mu1_lo_ground_truth)                    0.93          1 3690.84
## sd(mu2_Intercept)                          0.00          1  564.82
## sd(theta2_Intercept)                       0.28          1 3371.61
## cor(mu1_Intercept,mu1_lo_ground_truth)     0.64          1  221.08
## 
## Population-Level Effects: 
##                                                Estimate Est.Error l-95% CI
## mu1_Intercept                                     -0.07      0.04    -0.10
## mu2_Intercept                                     -0.00      0.00    -0.00
## mu1_lo_ground_truth:conditionHOPs                  1.47      0.06     1.41
## mu1_lo_ground_truth:conditionintervals_w_means     1.09      0.02     1.07
## mu1_lo_ground_truth:conditionmeans_only            1.13      0.08     1.05
## theta2_conditionHOPs                              -2.14      0.00    -2.14
## theta2_conditionintervals_w_means                 -1.66      0.06    -1.72
## theta2_conditionmeans_only                        -1.49      0.07    -1.56
##                                                u-95% CI Eff.Sample    Rhat
## mu1_Intercept                                     -0.03          1 5943.05
## mu2_Intercept                                     -0.00          1 1064.94
## mu1_lo_ground_truth:conditionHOPs                  1.53          1  415.35
## mu1_lo_ground_truth:conditionintervals_w_means     1.11          1  174.23
## mu1_lo_ground_truth:conditionmeans_only            1.22          1  682.79
## theta2_conditionHOPs                              -2.14          1    5.04
## theta2_conditionintervals_w_means                 -1.60          1  368.85
## theta2_conditionmeans_only                        -1.42          1  344.26
## 
## Family Specific Parameters: 
##        Estimate Est.Error l-95% CI u-95% CI Eff.Sample   Rhat
## sigma1     3.25      0.21     3.04     3.46          1 652.71
## sigma2     0.00      0.00     0.00     0.00          2   4.30
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check out a posterior predictive distribution for bets.

# posterior predictive check
model_df_llo %>%
  select(lo_ground_truth,condition,sd_diff,worker_id) %>%
  add_predicted_draws(m.bet.llo_mix, prediction = "lo_bet", seed = 1234) %>%
  mutate(post_bet = plogis(lo_bet)) %>%
  ggplot(aes(x=post_bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for bets") +
  theme(panel.grid = element_blank())

How does this compare to the actual distribution of bet amounts?

# posterior predictive check
model_df_llo %>%
  ggplot(aes(x=bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Empirical distribution of bets") +
  theme(panel.grid = element_blank())

What do the posterior for the effect of each visualization condition look like?

# use posterior samples to define distributions for the slope in each visualization condition
posterior_samples(m.bet.llo_mix) %>%
  transmute(slope_HOPs = `b_mu1_lo_ground_truth:conditionHOPs`,
            slope_intervals_w_means = `b_mu1_lo_ground_truth:conditionintervals_w_means`,
            slope_means_only = `b_mu1_lo_ground_truth:conditionmeans_only`) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank())

Let’s take a look at some of the estimated linear models per visualization condition.

# this time we'll adopt functions from the tidybayes package to make plotting posterior predictions easier
model_df_llo %>%
  group_by(condition,sd_diff,worker_id) %>%
  data_grid(lo_ground_truth = seq_range(lo_ground_truth, n = 101)) %>%
  add_predicted_draws(m.bet.llo_mix) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_bet, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df_llo$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df_llo$lo_bet, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(sd_diff ~ condition)

What does this look like in probability units?

model_df_llo %>%
  group_by(condition,sd_diff,worker_id) %>%
  data_grid(lo_ground_truth = seq_range(lo_ground_truth, n = 101)) %>%
  add_predicted_draws(m.bet.llo_mix) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_bet), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(plogis(model_df_llo$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df_llo$lo_bet), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(sd_diff ~ condition)

What about the mixture proportions? Let’s plot the posterior for theta. Because theta is in log odds units we’ll transform it into probability units.

# posteriors of mixture proportion
posterior_samples(m.bet.llo_mix) %>%
  transmute(p_mix_HOPs = plogis(b_theta2_conditionHOPs),
            p_mix_intervals_w_means = plogis(b_theta2_conditionintervals_w_means),
            p_mix_means_only = plogis(b_theta2_conditionmeans_only)) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for proportion of constant response by visualization condition") +
  theme(panel.grid = element_blank())

This model has all sorts of issues for our betting data, so let’s revert to something simpler.

Back to the LLO Model

Since the mixture fit is terrible, let’s try just the LLO model.

# fit the model
m.bet.llo <- brm(
   bf(lo_bet ~ (1 + lo_ground_truth|worker_id) + 0 + condition + lo_ground_truth:condition # llo model
  ),
  data = model_df_llo,
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_llo"
)

Check diagnostics:

# trace plots
plot(m.bet.llo)

# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.llo, pars = c("b_Intercept",
                          "sd_worker_id__Intercept",
                          "sd_worker_id__lo_ground_truth",
                          "cor_worker_id__Intercept__lo_ground_truth",
                          "sigma"))

# pairs plot (too many things to view at once, so we've grouped them)
# slope and intercept effects
pairs(m.bet.llo, pars = c("b_conditionHOPs",
                          "b_conditionmeans_only",
                          "b_conditionintervals_w_means",
                          "b_lo_ground_truth:conditionHOPs",
                          "b_lo_ground_truth:conditionmeans_only",
                          "b_lo_ground_truth:conditionintervals_w_means"))

# model summary
print(m.bet.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: lo_bet ~ (1 + lo_ground_truth | worker_id) + 0 + condition + lo_ground_truth:condition 
##    Data: model_df_llo (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      2.31      0.32     1.77     3.02
## sd(lo_ground_truth)                0.81      0.11     0.62     1.06
## cor(Intercept,lo_ground_truth)     0.24      0.17    -0.11     0.54
##                                Eff.Sample Rhat
## sd(Intercept)                         336 1.01
## sd(lo_ground_truth)                   417 1.00
## cor(Intercept,lo_ground_truth)        171 1.01
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## conditionHOPs                                  0.08      0.64    -1.16
## conditionintervals_w_means                    -0.08      0.71    -1.42
## conditionmeans_only                           -0.14      0.66    -1.39
## conditionHOPs:lo_ground_truth                  1.19      0.24     0.72
## conditionintervals_w_means:lo_ground_truth     0.96      0.25     0.44
## conditionmeans_only:lo_ground_truth            1.05      0.25     0.56
##                                            u-95% CI Eff.Sample Rhat
## conditionHOPs                                  1.36        158 1.00
## conditionintervals_w_means                     1.34        217 1.00
## conditionmeans_only                            1.12        246 1.00
## conditionHOPs:lo_ground_truth                  1.67        218 1.01
## conditionintervals_w_means:lo_ground_truth     1.43        224 1.02
## conditionmeans_only:lo_ground_truth            1.55        304 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     2.95      0.08     2.81     3.10       1276 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check out a posterior predictive distribution for bets.

# posterior predictive check
model_df_llo %>%
  select(lo_ground_truth,condition,sd_diff,worker_id) %>%
  add_predicted_draws(m.bet.llo, prediction = "lo_bet", seed = 1234) %>%
  mutate(post_bet = plogis(lo_bet)) %>%
  ggplot(aes(x=post_bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for bets") +
  theme(panel.grid = element_blank())

How does this compare to the actual distribution of bet amounts?

# posterior predictive check
model_df_llo %>%
  ggplot(aes(x=bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Empirical distribution of bets") +
  theme(panel.grid = element_blank())

What do the posterior for the slope of each visualization condition look like?

# use posterior samples to define distributions for the slope in each visualization condition
posterior_samples(m.bet.llo) %>%
  transmute(slope_HOPs = `b_conditionHOPs:lo_ground_truth`,
            slope_intervals_w_means = `b_conditionintervals_w_means:lo_ground_truth`,
            slope_means_only = `b_conditionmeans_only:lo_ground_truth`) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank())

What about the effects on intercepts?

# use posterior samples to define distributions for the slope in each visualization condition
posterior_samples(m.bet.llo) %>%
  transmute(intercept_HOPs = b_conditionHOPs,
            intercept_intervals_w_means = b_conditionintervals_w_means,
            intercept_means_only = b_conditionmeans_only) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for intercepts by visualization condition") +
  theme(panel.grid = element_blank())

Let’s take a look at some of the estimated linear models per visualization condition.

# this time we'll adopt functions from the tidybayes package to make plotting posterior predictions easier
model_df_llo %>%
  group_by(condition,worker_id) %>%
  data_grid(lo_ground_truth = seq_range(lo_ground_truth, n = 101)) %>%
  add_predicted_draws(m.bet.llo) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_bet, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df_llo$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df_llo$lo_bet, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(. ~ condition)

What does this look like in probability units?

model_df_llo %>%
  group_by(condition,worker_id) %>%
  data_grid(lo_ground_truth = seq_range(lo_ground_truth, n = 101)) %>%
  add_predicted_draws(m.bet.llo) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_bet), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(plogis(model_df_llo$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df_llo$lo_bet), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(. ~ condition)

This model looks alright, but there’s a lot of variability which is not accounted for by the model.

Adding Predictors to Deal with the Effect of Bet and Outcome on Previous Trial

Next, we’ll try to account for some of the apparent noise in responses by modeling ways that responses might depend on the previous trial. Based on Prospect Theory, we should expect that betting behavior will depend on the payoff on the previous trial in terms of both the betting behavior and the outcome which produced that payoff. Lower pay on the previous trial resulting from a win on a small bet would result in a higher bet on the next trail. Conversely, lower pay on the previous trial resulting from a loss on a large bet would result in a lower bet on the next trial. However, it is ambiguous whether higher pay on the last trial resulting from a win on a large bet would result in a lower or higher bet on the next trail. In any case, it’s worth investigating whether including information about the previous bet can improve our model.

Add Columns to Model Dataframe for Previous Bet and Outcome

# add columns for lo_bet and outcome on previous trial
model_df_llo <- model_df_llo %>%
  group_by(worker_id) %>%
  mutate(
    prev_lo_bet = lag(lo_bet, order_by = trial),
    prev_outcome = lag(as.logical(outcome), order_by = trial)
  ) %>%
  replace_na(list(prev_lo_bet = 0, prev_outcome= FALSE)) # consider setting this using practice trials
# model_df_llo %>%
#   select(worker_id, trial, lo_bet, prev_lo_bet, outcome, prev_outcome) %>%
#   arrange(worker_id,trial)

Fit the Model

In this model, we separate parameters for worker effects, visualization effects, and effects of the previous trial into separate submodels in order to get more interpretable intercepts. Otherwise, all we’ve done is add effects for the previous trail on top of the parameters in the last model.

# fit the model
m.bet.llo_prev <- brm(
   bf(lo_bet ~ worker + vis + trial,
      worker ~ (1 + lo_ground_truth|worker_id),
      vis~ 0 + condition + lo_ground_truth:condition,
      trial ~ prev_lo_bet*prev_outcome,
      nl = TRUE
  ),
  data = model_df_llo,
  prior = c(
    prior(normal(0, 1), class = b, nlpar = worker),
    prior(normal(0, 1), class = b, nlpar = vis),
    prior(normal(0, 1), class = b, nlpar = trial)
  ),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_llo_prev"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.llo_prev)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.llo_prev, pars = c("b_worker_Intercept",
                               "sd_worker_id__worker_Intercept",
                               "sd_worker_id__worker_lo_ground_truth",
                               "cor_worker_id__worker_Intercept__worker_lo_ground_truth",
                               "sigma"))

# pairs plot (too many things to view at once, so we've grouped them)
# slope and intercept effects
pairs(m.bet.llo_prev, pars = c("b_vis_conditionHOPs",
                               "b_vis_conditionmeans_only",
                               "b_vis_conditionintervals_w_means"))

# pairs plot (too many things to view at once, so we've grouped them)
# previous trial outcomes
pairs(m.bet.llo_prev, pars = c("b_trial_Intercept",
                               "b_trial_prev_lo_bet",
                               "b_trial_prev_lo_bet:prev_outcomeTRUE",
                               "b_trial_prev_outcomeTRUE"))

  • Summary
# model summary
print(m.bet.llo_prev)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: lo_bet ~ worker + vis + trial 
##          worker ~ (1 + lo_ground_truth | worker_id)
##          vis ~ 0 + condition + lo_ground_truth:condition
##          trial ~ prev_lo_bet * prev_outcome
##    Data: model_df_llo (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                              Estimate Est.Error l-95% CI
## sd(worker_Intercept)                             2.26      0.31     1.74
## sd(worker_lo_ground_truth)                       0.82      0.12     0.63
## cor(worker_Intercept,worker_lo_ground_truth)     0.25      0.18    -0.11
##                                              u-95% CI Eff.Sample Rhat
## sd(worker_Intercept)                             2.92        472 1.00
## sd(worker_lo_ground_truth)                       1.08        481 1.00
## cor(worker_Intercept,worker_lo_ground_truth)     0.58        321 1.00
## 
## Population-Level Effects: 
##                                                Estimate Est.Error l-95% CI
## worker_Intercept                                   0.05      0.76    -1.40
## vis_conditionHOPs                                  0.10      0.69    -1.24
## vis_conditionintervals_w_means                    -0.04      0.69    -1.34
## vis_conditionmeans_only                           -0.08      0.72    -1.55
## vis_conditionHOPs:lo_ground_truth                  1.12      0.23     0.66
## vis_conditionintervals_w_means:lo_ground_truth     0.89      0.23     0.43
## vis_conditionmeans_only:lo_ground_truth            1.00      0.24     0.49
## trial_Intercept                                    0.02      0.72    -1.40
## trial_prev_lo_bet                                  0.07      0.04    -0.01
## trial_prev_outcomeTRUE                            -0.16      0.24    -0.65
## trial_prev_lo_bet:prev_outcomeTRUE                -0.08      0.05    -0.18
##                                                u-95% CI Eff.Sample Rhat
## worker_Intercept                                   1.47       1285 1.00
## vis_conditionHOPs                                  1.42        586 1.00
## vis_conditionintervals_w_means                     1.29        687 1.00
## vis_conditionmeans_only                            1.26        651 1.00
## vis_conditionHOPs:lo_ground_truth                  1.57        401 1.00
## vis_conditionintervals_w_means:lo_ground_truth     1.35        493 1.00
## vis_conditionmeans_only:lo_ground_truth            1.46        460 1.00
## trial_Intercept                                    1.48       1199 1.00
## trial_prev_lo_bet                                  0.15       1121 1.00
## trial_prev_outcomeTRUE                             0.31       1956 1.00
## trial_prev_lo_bet:prev_outcomeTRUE                 0.03       1295 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     2.96      0.08     2.80     3.12       1852 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check out a posterior predictive distribution for bets.

# posterior predictive check
model_df_llo %>%
  select(lo_ground_truth,condition,sd_diff,worker_id,prev_lo_bet,prev_outcome) %>%
  add_predicted_draws(m.bet.llo_prev, prediction = "lo_bet", seed = 1234) %>%
  mutate(post_bet = plogis(lo_bet)) %>%
  ggplot(aes(x=post_bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for bets") +
  theme(panel.grid = element_blank())

How does this compare to the actual distribution of bet amounts?

# posterior predictive check
model_df_llo %>%
  ggplot(aes(x=bet)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Empirical distribution of bets") +
  theme(panel.grid = element_blank())

What do the posterior for the slope of each visualization condition look like?

# use posterior samples to define distributions for the slope in each visualization condition
posterior_samples(m.bet.llo_prev) %>%
  transmute(slope_HOPs = `b_vis_conditionHOPs:lo_ground_truth`,
            slope_intervals_w_means = `b_vis_conditionintervals_w_means:lo_ground_truth`,
            slope_means_only = `b_vis_conditionmeans_only:lo_ground_truth`) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank())

What about the effects on intercepts?

# use posterior samples to define distributions for the intercept in each visualization condition
posterior_samples(m.bet.llo_prev) %>%
  transmute(intercept_HOPs = b_vis_conditionHOPs,
            intercept_intervals_w_means = b_vis_conditionintervals_w_means,
            intercept_means_only = b_vis_conditionmeans_only) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for intercepts by visualization condition") +
  theme(panel.grid = element_blank())

Let’s take a look at some of the estimated linear models per visualization condition.

# need to extend memory limit in .Renviron to run this (https://stackoverflow.com/questions/51295402/r-on-macos-error-vector-memory-exhausted-limit-reached)
model_df_llo %>%
  group_by(condition,worker_id) %>%
  data_grid(
    lo_ground_truth = seq_range(lo_ground_truth, n = 21),
    prev_lo_bet = seq_range(prev_lo_bet, n = 21),
    prev_outcome = c(TRUE, FALSE)) %>%
  add_predicted_draws(m.bet.llo_prev, n = 500) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_bet, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(model_df_llo$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df_llo$lo_bet, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(. ~ condition)

What does this look like in probability units?

model_df_llo %>%
  group_by(condition,sd_diff,worker_id) %>%
  data_grid(
    lo_ground_truth = seq_range(lo_ground_truth, n = 21),
    prev_lo_bet = seq_range(prev_lo_bet, n = 21),
    prev_outcome = c(TRUE, FALSE)) %>%
  add_predicted_draws(m.bet.llo_prev, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_bet), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_llo, alpha = 0.3) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  coord_cartesian(xlim = quantile(plogis(model_df_llo$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df_llo$lo_bet), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_grid(sd_diff ~ condition)

Let’s also look at the posteriors for the effects of the previous trial.

# use posterior samples to define distributions for the intercept in each visualization condition
posterior_samples(m.bet.llo_prev) %>%
  transmute(slope_prev_bet_lose = b_trial_prev_lo_bet,
            slope_prev_bet_win = b_trial_prev_lo_bet + `b_trial_prev_lo_bet:prev_outcomeTRUE`,
            intercept_lose = b_trial_Intercept,
            intercept_win = b_trial_prev_outcomeTRUE
            ) %>%
  gather(key, value) %>%
  ggplot(aes(x = value, group = key, color = key, fill = key)) +
  geom_density(alpha = 0.3) +
  # scale_fill_brewer(type = "qual", palette = 1) +
  # scale_color_brewer(type = "qual", palette = 1) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for intercepts by visualization condition") +
  theme(panel.grid = element_blank())

The problem with this model is that the effect of the previous trial is small and does not change the difference between the posterior predictive distribution of bets under the LLO model and the empirical distribution of bets. In particular, the model does not caputure the modal bet near 500 (the middle of the scale).

Did we really get anything out of adding these previous trial predictors to the model? Let’s compare the two versions of the LLO model that we’ve fit for bets based on widely applicable information criteria.

waic(m.bet.llo, m.bet.llo_prev)
##                               WAIC    SE
## m.bet.llo                  3979.98 52.78
## m.bet.llo_prev             3985.95 52.43
## m.bet.llo - m.bet.llo_prev   -5.97  4.68

It looks like the model without the predictors for the previous trial performed better.

Ordinal Models of Betting

Since the distribution of bets is trimodal, and we cannot seem to account for the center mode in the LLO model, let’s treat user responses as reflecting a choice between three strategies: bet low, bet middle, and bet high.

Categorizing Bets

We’ll split bets into three categories: low, middle, and high. We’ll use split points at 334 and 667 on a scale from 1 to 1000.

# add column for ordinal_bet
model_df_ordinal <- model_df_llo %>%
  mutate(
    ordinal_bet = if_else(bet < 334, 1, if_else(bet < 667, 2, 3)) # {1: low, 2: mid, 3, high}
  )

We’re going to try five different forms of ordinal regression and compare them using WAIC in order to be confident that we’ve made an appropriate choice of model.

Fit the Cumulative Model

This model is has a similae structure to our LLO model, but it models bets as log odds of ordinal responses rather than a log odds transform of the proportion of the budget allocated to the bet.

However, there are multiple ways we could set up an ordinal model for this data. We’ll try a few different types of ordinal model, starting with a cumulative model which assumes that bets reflect an unobserved latent percept (i.e., utility). Importantly, this model assumes that predictors have the same effect across all categories of responses and that each condition has equal variances.

# fit the model
m.bet.ordinal <- brm(
  bf(ordinal_bet ~ (1 + lo_ground_truth|worker_id) + condition * lo_ground_truth),
  data = model_df_ordinal,
  family = cumulative("logit"),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_ordinal"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.ordinal)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.ordinal, pars = c("sd_worker_id__Intercept",
                              "sd_worker_id__lo_ground_truth",
                              "cor_worker_id__Intercept__lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# threshold effects
pairs(m.bet.ordinal, pars = c("b_Intercept",
                              "b_lo_ground_truth", # HOPs condition
                              "b_conditionmeans_only",
                              "b_conditionmeans_only:lo_ground_truth",
                              "b_conditionintervals_w_means",
                              "b_conditionintervals_w_means:lo_ground_truth"))

  • Summary
# model summary
print(m.bet.ordinal)
##  Family: cumulative 
##   Links: mu = logit; disc = identity 
## Formula: ordinal_bet ~ (1 + lo_ground_truth | worker_id) + condition * lo_ground_truth 
##    Data: model_df_ordinal (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      2.35      0.39     1.72     3.24
## sd(lo_ground_truth)                0.83      0.15     0.58     1.18
## cor(Intercept,lo_ground_truth)     0.07      0.20    -0.34     0.46
##                                Eff.Sample Rhat
## sd(Intercept)                         407 1.00
## sd(lo_ground_truth)                   489 1.00
## cor(Intercept,lo_ground_truth)        529 1.00
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## Intercept[1]                                  -0.25      0.71    -1.65
## Intercept[2]                                   2.11      0.71     0.68
## conditionintervals_w_means                     0.32      0.98    -1.66
## conditionmeans_only                           -0.01      1.04    -2.17
## lo_ground_truth                                1.08      0.25     0.62
## conditionintervals_w_means:lo_ground_truth    -0.29      0.35    -1.02
## conditionmeans_only:lo_ground_truth           -0.06      0.35    -0.79
##                                            u-95% CI Eff.Sample Rhat
## Intercept[1]                                   1.16        338 1.01
## Intercept[2]                                   3.58        353 1.01
## conditionintervals_w_means                     2.33        392 1.00
## conditionmeans_only                            2.14        331 1.01
## lo_ground_truth                                1.57        502 1.01
## conditionintervals_w_means:lo_ground_truth     0.40        465 1.00
## conditionmeans_only:lo_ground_truth            0.60        369 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Based on the pairs plots, it looks like the thresholds (i.e., intercepts) are highly correlated. This is probably an issue.

Fit the Adjacent Category Model

The next type of ordinal model we’ll try is an adjacent category model. We’ll leave our assumptions consistent with the previous model to see if changing the type of model impacts fit. We assume that bets reflect an unobserved latent percept (i.e., utility), that predictors have the same effect across all categories of responses and that each condition has equal variances.

# fit the model
m.bet.ordinal2 <- brm(
  bf(ordinal_bet ~ (1 + lo_ground_truth|worker_id) + condition * lo_ground_truth),
  data = model_df_ordinal,
  family = acat("logit"),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_ordinal2"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.ordinal2)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.ordinal2, pars = c("sd_worker_id__Intercept",
                              "sd_worker_id__lo_ground_truth",
                              "cor_worker_id__Intercept__lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# threshold effects
pairs(m.bet.ordinal2, pars = c("b_Intercept",
                              "b_lo_ground_truth", # HOPs condition
                              "b_conditionmeans_only",
                              "b_conditionmeans_only:lo_ground_truth",
                              "b_conditionintervals_w_means",
                              "b_conditionintervals_w_means:lo_ground_truth"))

  • Summary
# model summary
print(m.bet.ordinal2)
##  Family: acat 
##   Links: mu = logit; disc = identity 
## Formula: ordinal_bet ~ (1 + lo_ground_truth | worker_id) + condition * lo_ground_truth 
##    Data: model_df_ordinal (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      1.93      0.33     1.41     2.65
## sd(lo_ground_truth)                0.70      0.13     0.48     0.98
## cor(Intercept,lo_ground_truth)     0.07      0.20    -0.32     0.43
##                                Eff.Sample Rhat
## sd(Intercept)                         592 1.00
## sd(lo_ground_truth)                   641 1.00
## cor(Intercept,lo_ground_truth)        689 1.00
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## Intercept[1]                                   0.07      0.56    -0.92
## Intercept[2]                                   1.48      0.57     0.46
## conditionintervals_w_means                     0.26      0.79    -1.30
## conditionmeans_only                            0.05      0.85    -1.59
## lo_ground_truth                                0.89      0.23     0.45
## conditionintervals_w_means:lo_ground_truth    -0.22      0.32    -0.83
## conditionmeans_only:lo_ground_truth           -0.01      0.31    -0.63
##                                            u-95% CI Eff.Sample Rhat
## Intercept[1]                                   1.24        416 1.00
## Intercept[2]                                   2.71        454 1.00
## conditionintervals_w_means                     1.86        469 1.00
## conditionmeans_only                            1.85        482 1.00
## lo_ground_truth                                1.33        557 1.00
## conditionintervals_w_means:lo_ground_truth     0.41        560 1.01
## conditionmeans_only:lo_ground_truth            0.61        578 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

We’re still seeing a high degree of correlation for intercept parameters in the pairs plot.

Fit the Adjacent Category Model with Category-Specific Effects

Next, we’ll try the adjacent category model with category specific effects. Now we assume that predictors have different effects on the proportion of each category of responses. We still assumpe that each condition has equal variances.

# fit the model
m.bet.ordinal3 <- brm(
  bf(ordinal_bet ~ (1 + lo_ground_truth|worker_id) + cs(condition) + lo_ground_truth + condition:lo_ground_truth), # cs() denotes category specific effects
  data = model_df_ordinal,
  family = acat("logit"),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_ordinal3"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.ordinal3)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.ordinal3, pars = c("sd_worker_id__Intercept",
                              "sd_worker_id__lo_ground_truth",
                              "cor_worker_id__Intercept__lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# threshold effects
pairs(m.bet.ordinal, pars = c("b_Intercept",
                              "b_lo_ground_truth", # HOPs condition
                              "bcs_conditionmeans_only",
                              "b_conditionmeans_only:lo_ground_truth",
                              "bcs_conditionintervals_w_means",
                              "b_conditionintervals_w_means:lo_ground_truth"))

  • Summary
# model summary
print(m.bet.ordinal3)
##  Family: acat 
##   Links: mu = logit; disc = identity 
## Formula: ordinal_bet ~ (1 + lo_ground_truth | worker_id) + cs(condition) + lo_ground_truth + condition:lo_ground_truth 
##    Data: model_df_ordinal (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      1.91      0.33     1.36     2.66
## sd(lo_ground_truth)                0.68      0.13     0.47     0.98
## cor(Intercept,lo_ground_truth)     0.03      0.20    -0.37     0.42
##                                Eff.Sample Rhat
## sd(Intercept)                         522 1.00
## sd(lo_ground_truth)                   736 1.00
## cor(Intercept,lo_ground_truth)        626 1.00
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## Intercept[1]                                   0.23      0.57    -0.92
## Intercept[2]                                   1.25      0.61     0.04
## lo_ground_truth                                0.84      0.22     0.41
## lo_ground_truth:conditionintervals_w_means    -0.10      0.29    -0.65
## lo_ground_truth:conditionmeans_only           -0.03      0.30    -0.60
## conditionintervals_w_means[1]                  0.75      0.83    -0.91
## conditionintervals_w_means[2]                 -0.36      0.86    -2.10
## conditionmeans_only[1]                        -0.07      0.86    -1.80
## conditionmeans_only[2]                         0.18      0.90    -1.65
##                                            u-95% CI Eff.Sample Rhat
## Intercept[1]                                   1.35        358 1.00
## Intercept[2]                                   2.46        339 1.00
## lo_ground_truth                                1.30        608 1.01
## lo_ground_truth:conditionintervals_w_means     0.49        618 1.00
## lo_ground_truth:conditionmeans_only            0.57        614 1.01
## conditionintervals_w_means[1]                  2.44        446 1.00
## conditionintervals_w_means[2]                  1.32        485 1.00
## conditionmeans_only[1]                         1.62        338 1.00
## conditionmeans_only[2]                         1.90        352 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Fit the Adjacent Category Model with Unequal Variances

For this next model, we revert to the assumption that predictors have the same effect on the proportion of each category of responses, but now we assume that conditions have unequal variances.

# fit the model
m.bet.ordinal4 <- brm(
  bf(ordinal_bet ~ (1 + lo_ground_truth|worker_id) + condition * lo_ground_truth) + # no category-specific effects
  lf(disc ~ 0 + condition, cmc = FALSE), # disc ~ 1 / sd
  data = model_df_ordinal,
  family = acat("logit"),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_ordinal4"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.ordinal4)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.ordinal4, pars = c("sd_worker_id__Intercept",
                               "sd_worker_id__lo_ground_truth",
                               "cor_worker_id__Intercept__lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# threshold effects
pairs(m.bet.ordinal4, pars = c("b_Intercept",
                              "b_lo_ground_truth", # HOPs condition
                              "b_conditionmeans_only",
                              "b_conditionmeans_only:lo_ground_truth",
                              "b_conditionintervals_w_means",
                              "b_conditionintervals_w_means:lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# discriminability effects
pairs(m.bet.ordinal4, pars = c("b_disc_conditionmeans_only",
                              "b_disc_conditionintervals_w_means"))

  • Summary
# model summary
print(m.bet.ordinal4)
##  Family: acat 
##   Links: mu = logit; disc = log 
## Formula: ordinal_bet ~ (1 + lo_ground_truth | worker_id) + condition * lo_ground_truth 
##          disc ~ 0 + condition
##    Data: model_df_ordinal (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      3.03      0.73     1.85     4.70
## sd(lo_ground_truth)                1.02      0.25     0.62     1.60
## cor(Intercept,lo_ground_truth)     0.01      0.21    -0.41     0.42
##                                Eff.Sample Rhat
## sd(Intercept)                         390 1.00
## sd(lo_ground_truth)                   591 1.00
## cor(Intercept,lo_ground_truth)        696 1.00
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## Intercept[1]                                   0.07      0.89    -1.62
## Intercept[2]                                   1.93      0.91     0.24
## conditionintervals_w_means                     0.16      1.26    -2.32
## conditionmeans_only                           -0.33      1.37    -3.18
## lo_ground_truth                                1.08      0.32     0.50
## conditionintervals_w_means:lo_ground_truth     0.08      0.52    -0.82
## conditionmeans_only:lo_ground_truth            0.51      0.54    -0.39
## disc_conditionintervals_w_means               -0.60      0.36    -1.36
## disc_conditionmeans_only                      -0.73      0.28    -1.29
##                                            u-95% CI Eff.Sample Rhat
## Intercept[1]                                   2.00        341 1.00
## Intercept[2]                                   3.93        344 1.00
## conditionintervals_w_means                     2.60        355 1.00
## conditionmeans_only                            2.27        493 1.00
## lo_ground_truth                                1.76        572 1.00
## conditionintervals_w_means:lo_ground_truth     1.17        426 1.00
## conditionmeans_only:lo_ground_truth            1.71        592 1.00
## disc_conditionintervals_w_means                0.08        370 1.00
## disc_conditionmeans_only                      -0.20        364 1.01
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Fit the Adjacent Category Model with Category-Specific Effects and Unequal Variances

Now we assume that predictors have different effects on the proportion of each category of responses, and we assume that conditions have unequal variances.

# fit the model
m.bet.ordinal5 <- brm(
  bf(ordinal_bet ~ (1 + lo_ground_truth|worker_id) + cs(condition) + lo_ground_truth + condition:lo_ground_truth) + # cs() denotes category specific effects
  lf(disc ~ 0 + condition, cmc = FALSE), # disc ~ 1 / sd
  data = model_df_ordinal,
  family = acat("logit"),
  inits = 1, chains = 2, cores = 2,
  control = list(adapt_delta = 0.999, max_treedepth=15),
  file = "stan/m_bet_ordinal5"
)

Check diagnostics:

  • Trace plots
# trace plots
plot(m.bet.ordinal5)

  • Pairs plot
# pairs plot (too many things to view at once, so we've grouped them)
# hyperparameters
pairs(m.bet.ordinal5, pars = c("sd_worker_id__Intercept",
                               "sd_worker_id__lo_ground_truth",
                               "cor_worker_id__Intercept__lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# threshold effects
pairs(m.bet.ordinal5, pars = c("b_Intercept",
                              "b_lo_ground_truth", # HOPs condition
                              "bcs_conditionmeans_only",
                              "b_conditionmeans_only:lo_ground_truth",
                              "bcs_conditionintervals_w_means",
                              "b_conditionintervals_w_means:lo_ground_truth"))

# pairs plot (too many things to view at once, so we've grouped them)
# discriminability effects
pairs(m.bet.ordinal5, pars = c("b_disc_conditionmeans_only",
                              "b_disc_conditionintervals_w_means"))

  • Summary
# model summary
print(m.bet.ordinal5)
##  Family: acat 
##   Links: mu = logit; disc = log 
## Formula: ordinal_bet ~ (1 + lo_ground_truth | worker_id) + cs(condition) + lo_ground_truth + condition:lo_ground_truth 
##          disc ~ 0 + condition
##    Data: model_df_ordinal (Number of observations: 780) 
## Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 39) 
##                                Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                      4.73      1.55     2.55     8.33
## sd(lo_ground_truth)                1.40      0.41     0.82     2.37
## cor(Intercept,lo_ground_truth)    -0.15      0.21    -0.54     0.27
##                                Eff.Sample Rhat
## sd(Intercept)                         273 1.00
## sd(lo_ground_truth)                   323 1.00
## cor(Intercept,lo_ground_truth)        568 1.00
## 
## Population-Level Effects: 
##                                            Estimate Est.Error l-95% CI
## Intercept[1]                                   0.58      1.30    -1.93
## Intercept[2]                                   2.06      1.32    -0.38
## lo_ground_truth                                1.17      0.43     0.42
## lo_ground_truth:conditionintervals_w_means     1.14      1.02    -0.39
## lo_ground_truth:conditionmeans_only            1.13      0.98    -0.44
## disc_conditionintervals_w_means               -1.23      0.36    -1.92
## disc_conditionmeans_only                      -1.11      0.36    -1.85
## conditionintervals_w_means[1]                  2.37      2.19    -1.69
## conditionintervals_w_means[2]                 -2.96      2.63    -8.98
## conditionmeans_only[1]                        -0.68      2.11    -5.25
## conditionmeans_only[2]                        -0.90      2.25    -6.02
##                                            u-95% CI Eff.Sample Rhat
## Intercept[1]                                   3.45        327 1.00
## Intercept[2]                                   4.90        327 1.00
## lo_ground_truth                                2.11        464 1.00
## lo_ground_truth:conditionintervals_w_means     3.31        278 1.00
## lo_ground_truth:conditionmeans_only            3.40        302 1.00
## disc_conditionintervals_w_means               -0.55        258 1.00
## disc_conditionmeans_only                      -0.43        275 1.00
## conditionintervals_w_means[1]                  7.17        421 1.00
## conditionintervals_w_means[2]                  1.17        188 1.01
## conditionmeans_only[1]                         3.28        345 1.00
## conditionmeans_only[2]                         2.85        327 1.01
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The pairs plots for this last model are showing more correlation between parameters than previous models. This gives us some indication that some of the parameters in this most complex model are capturing redundant information.

Model Comparison

Each of the models above seem to fit similarly well, so we’ll compare them using the widely applicable information criterion (WAIC) to find the model which strikes the best balance of fitting the data without overfitting.

waic(m.bet.ordinal, m.bet.ordinal2, m.bet.ordinal3, m.bet.ordinal4, m.bet.ordinal5)
##                                    WAIC    SE
## m.bet.ordinal                   1011.31 40.26
## m.bet.ordinal2                  1012.60 40.17
## m.bet.ordinal3                  1010.73 40.58
## m.bet.ordinal4                  1016.68 39.66
## m.bet.ordinal5                  1010.98 39.90
## m.bet.ordinal - m.bet.ordinal2    -1.29  4.48
## m.bet.ordinal - m.bet.ordinal3     0.58  7.28
## m.bet.ordinal - m.bet.ordinal4    -5.37  5.95
## m.bet.ordinal - m.bet.ordinal5     0.33  7.69
## m.bet.ordinal2 - m.bet.ordinal3    1.86  5.58
## m.bet.ordinal2 - m.bet.ordinal4   -4.08  4.84
## m.bet.ordinal2 - m.bet.ordinal5    1.62  7.07
## m.bet.ordinal3 - m.bet.ordinal4   -5.94  7.83
## m.bet.ordinal3 - m.bet.ordinal5   -0.25  5.60
## m.bet.ordinal4 - m.bet.ordinal5    5.70  5.88

As a rule of thumb the lower the WAIC the better the model. However, it is important to consider differences in WAIC in context. Information criteria trade off a KL divergence measure of fit agains the number of parameters in the model such that only a relatively large difference in WAIC would indicate that a more complex model is worth the risk of overfitting due to additional parameters. We should really only interpret a difference in WAIC if it is greater than the standard error of that difference. Thus, we can conclude that modeling unequal variances (models 3 and 5) leads to some risk of overfitting and that using an adjacent category model (with or without category specific effects, models 2 and 4, respectively) does not lead to substantial improvement over the cumulative model. Considering the cumulative model has a more robust theoretical motivation (see https://psyarxiv.com/x8swp/), we should probably stick to the simplest of the ordinal regression models we tried.

Checking Predictions

For checking the quality of our model predictions, we’ll run with the cumulative model.

Let’s check out a posterior predictive distribution for bets.

# posterior predictive check
model_df_ordinal %>%
  select(lo_ground_truth,condition,sd_diff,worker_id) %>%
  add_predicted_draws(m.bet.ordinal, prediction = "ordinal_bet", seed = 1234) %>%
  ggplot(aes(x=ordinal_bet)) +
  geom_bar(fill = "black") +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for bets") +
  theme(panel.grid = element_blank())

How does this compare to the actual distribution of bet amounts?

# posterior predictive check
model_df_ordinal %>%
  ggplot(aes(x=ordinal_bet)) +
  geom_bar(fill = "black") +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Empirical distribution of bets") +
  theme(panel.grid = element_blank())

This model does pretty well at recovering the proportion of each category of bet.

How do these posteriors break down by condition and ground truth probability of victory?

Let’s define the optimal bet on each trial so we have a point of reference for evaluating bets.

# call the optimal bet function on our data
optimal_bets <- model_df_ordinal %>%
  select(condition, lo_ground_truth, sd_diff, worker_id) %>%
  mutate(
    ground_truth = plogis(lo_ground_truth),
    optimal_bet = if_else(ground_truth < .34, 1, if_else(ground_truth < .67, 2, 3))
  )
# count the number of each optimal bet per condition
optimal_bets_per_condition <- optimal_bets %>%
  count(condition, optimal_bet)

First, let’s get an overview of how users in each condition are betting relative to the proportion of each bet category we would expect under the optimal strategy.

model_df_ordinal %>%
  select(condition, worker_id, lo_ground_truth) %>%
  add_predicted_draws(m.bet.ordinal, prediction = "predicted_bet", seed = 1234) %>%
  group_by(condition, .draw, predicted_bet) %>%
  tally() %>% 
  ggplot(aes(x = predicted_bet, y = n, color = condition)) +
  geom_line(aes(group = .draw), alpha = .05) +
  geom_line(aes(x = optimal_bet), data = optimal_bets_per_condition, alpha = .3, color = "red", linetype = "dashed", show.legend = FALSE) +
  scale_color_brewer(type = "qual", palette = 1) +
  scale_x_discrete(limits = c(1, 2, 3)) +
  theme_bw() +
  facet_wrap(. ~ condition)

Next, let’s get a more detailed view of the model predictions by plotting them against the ground truth.

model_df_ordinal %>%
  group_by(condition, worker_id) %>%
  data_grid(lo_ground_truth = seq_range(lo_ground_truth, n = 101)) %>%
  add_fitted_draws(m.bet.ordinal, value = "P(bet | ground truth)", n = 100) %>%
  # turn `.category` (a factor) into a numeric value in {1, 2, 3}
  mutate(bet = as.numeric(as.character(.category))) %>%
  group_by(condition, worker_id, lo_ground_truth, .draw) %>%
  # calculate expected bet category
  summarise(ordinal_bet = sum(bet * `P(bet | ground truth)`)) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = ordinal_bet, color = condition, fill = condition)) +
  stat_lineribbon(.width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(data = model_df_ordinal, shape = 21, size = 2, alpha = 1/5) +
  geom_line(aes(y = optimal_bet), data = optimal_bets, alpha = .3, color = "red", linetype = "dashed", show.legend = FALSE) +
  scale_fill_brewer(type = "qual", palette = 1) +
  scale_color_brewer(type = "qual", palette = 1) + 
  scale_x_continuous(limits = quantile(plogis(model_df_ordinal$lo_ground_truth), c(0, 1))) +
  scale_y_discrete(limits = c(1, 2, 3)) +
  theme_bw() +
  facet_wrap(. ~ condition)